R in Action

Efficient data science with R

A demonstration by Md. Aminul Islam Shazid.

Grammar of graphics with ggplot2

Plots using grammar of graphics with ggplot2

  • ggplot2 is an R package that implements the grammar of graphics.
  • Can provide beautiful graphics with some simple building blocks.
  • Variables/features/columns are mapped to various elements of the plot called “aesthetics”, e.g., axis, colours, point size, line type etc.
  • Then a geometry transforms that “aesthetic” mapping into a plot.

A simple example

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm)) +
    geom_point()

Adding a grouping variable

ggplot(penguins, 
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm, 
                     color = species)) + 
    geom_point()

Let’s add another dimension to the plot!

ggplot(penguins, 
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm, 
                     color = species, 
                     size = body_mass_g)) + 
    geom_point(alpha = 0.5)

Adding yet another dimension!

ggplot(penguins, 
       mapping = aes(x = bill_length_mm, 
                     y = flipper_length_mm, 
                     color = species, 
                     size = body_mass_g)) + 
    geom_point(alpha = 0.5) +
    facet_wrap(~island)

Comparing a variable across groups with boxplot

ggplot(penguins,
       mapping = aes(y = body_mass_g, 
                     x = species, 
                     fill = species)) +
    geom_boxplot(width = 0.2, show.legend = FALSE)

Violon plots as alternative to boxplot

More informative: gives a sense of the density too!

ggplot(penguins,
       mapping = aes(y = body_mass_g, 
                     x = species, 
                     fill = species)) +
    geom_violin(width = 0.5, show.legend = FALSE) + 
    geom_boxplot(fill = "white", width = 0.1, color = "black", show.legend = FALSE)

Bar diagrams

penguins |> 
    count(island, species) |> 
    ggplot() + 
    aes(x = island, y = n, fill = species) + 
    geom_bar(stat = "identity", 
             position = position_dodge2(preserve = "single"))

Line chart

To show trend or evolution.

ggplot() + 
    aes(x = time(AirPassengers), y = AirPassengers) + 
    geom_line()

Line chart with a trend line!

LOESS smoother added as a trend line.

ggplot() + 
    aes(x = time(AirPassengers), y = AirPassengers) + 
    geom_line() + 
    geom_smooth()

Fast data exploration with DataExplorer

Basic info about a dataset

library(DataExplorer)
plot_intro(penguins)

Find missing values

plot_missing(penguins)

Frequency distribution of all discrete variables

plot_bar(diamonds)

Frequency distribution by a discrete variable

plot_bar(diamonds, by = "cut")

Histogram of all continuous variables

plot_histogram(diamonds)

Kernel density of all continuous variables

plot_density(diamonds)

Boxplot of continuous variables grouped by a categorical variable

plot_boxplot(diamonds, by = "cut")

Scatterplot of one variable with all other continuous variable

plot_scatterplot(
    split_columns(diamonds)$continuous, 
    by = "price", 
    sampled_rows = 1000L
)

Quantile-quantile plot of all continuous variables

plot_qq(diamonds)

Correlogram

plot_correlation(split_columns(diamonds)$continuous)

Publication ready tables with gtsummary